GLaMM-FullScope is a multimodal large model that integrates all capabilities of GLaMM, including scene dialogue generation, referring expression segmentation, region-level image description, image-level description generation, and visual question answering.
Text-to-Image
Transformers